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In this paper, we analyse large random Boolean networks in terms of a constraint satisfaction 
problem. We first develop an algorithmic scheme which allows to prune simple logical cascades and 

S under-determined variables, returning thereby the computational core of the network. Second we 
apply the cavity method to analyse number and organisation of fixed points. We find in particular a 
phase transition between an easy and a complex regulatory phase, the latter one being characterised 
by the existence of an exponential number of macroscopically separated fixed-point clusters. The 
C/j ' different techniques developed are reinterpreted as algorithms for the analysis of single Boolean 

networks, and they are applied to analysis and in silico experiments on the gene-regulatory networks 
of baker's yeast (saccaromices cerevisiae) and the segment-polarity genes of the fruit-fly drosophila 
melanogaster. 

I PACS numbers: 05.20.-y, 05.70.-a, 87.16.-b, 02.50.-r, 02.70.-c 

o 

, O, , I. INTRODUCTION 

One of the most surprising results of sequencing the human genome is the small number of genes found: We have 
only about 25 000 genes, compared to, e.g., the about 19 000 genes of the simple worm C. elegans pj. This implies 
that the higher complexity of the human organism cannot be understood in terms of the number of genes. In addition, 
all different cell tissues - humans have more than 200 - carry the full genetic information, but the corresponding gene 
^vq , expression patterns, i.e. the set of genes which are actually transcribed into mRNA and translated into proteins, 
are very different from one tissue to another. In this sense, different cell types can be understood as different robust 
expression states of the full genetic information. Simple mono-cellular organisms, in contrast, have generally only one 
expression state. 

The key process hereby is gene regulation. The transcription of a gene into mRNA is realised by a RNA polymerase. 
The action of this molecular machine is enhanced or suppressed by the presence of transcription factors (TF) binding to 
the DNA region upstream of the gene. TFs are themselves proteins, encoded in other genes. This defines a (directed) 
^ , regulatory interaction from the gene coding for the TF to the regulated gene. Only if the first one is expressed, the 
TF is present in the cell, and can thus up- or down-regulate the gene expression. The set of all regulatory interactions 
Q \ is called the gene-regulatory network (GRN). 
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FIG. 1: Schematic representation of the regulation of a gene C by two other genes A and B who code for transcription factors. 
These proteins bind to the promoter region of gene C, regulating thereby the binding of DNA polymerase II (the yellow 
semicircle) which is responsible for the transcription process. The right-hand side of the figure illustrates the Boolean model 
for this regulatory interaction: Genes are represented by binary variables, and their combinatorial control of the transcription 
of C is coded via a Boolean function. 

In the last two decades a large wealth of new data about the genome- wide organisation of gene-regulatory networks 
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has been and is still being collected [2j. It has become clear that in many cases biological functions cannot be 
identified at the level of single or few genes and proteins. Constructing a detailed biochemical model of an entire 
cell by analysing each gene and the nature of its interactions with others one by one appears, however, to be an 
intractable task. Even in the case of a relatively simple cell like yeast, the number of genes is slightly above 6000. In 
order to make progress in the understanding of the combinatorial aspects of gene regulation, it is thus crucial to model 
these systems on a coarse-grained level Genetic interactions have to be encoded in a discrete GRN: nodes of this 
network represent individual genes, and directed connections their regulatory interactions. Such models are expected 
to give valuable insight into the collective large-scale behaviour of the gene regulatory mechanism, and - through the 
understanding of these discrete models - it seems to be possible to infer and study emergent cooperative phenomena 
in real biological systems which cannot be understood at the level of single genes 0, Numerical simulations of 
these model allows further on efficient in silico experiments which may serve as a valuable input to the design of real 
experiments. Statistical physics provides important tools for the analysis of such complex systems, and in turn finds 
in them new technical challenges. 

In order to approach this problem, we need to choose a class of discrete models which on one hand is detailed 
enough to schematically encode the underlying biological processes, but on the other hand is simple enough to allow 
for a large-scale quantitative analysis. We therefore decided to restrict the present work to Boolean Networks (BN) 
which compress the full information about the transcription of a gene into a binary variable. This choice allows for 
a pretty complete characterisation of the topological and functional properties we are going to analyse below. Note 
however, that all applied methods can be simply generalised to more detailed descriptions, as long as the discrete 
nature of modelling is retained. 

BN are dynamical models originally introduced by S. Kauffman in the late 60s 6]. Since Kauffman's seminal work, 
BN have been used as abstract modelling schemes in many different fields, including cell differentiation, immune 
response, evolution, and gene-regulatory networks (for an introductory review see []| and references therein). In 
recent days, BN have received a renewed attention as a powerful scheme of data analysis, and modelling of high- 
throughput genomics and proteomics experiments 

The bottom-line of previous research is the description and classification of different attractor types being present 
in BN under deterministic parallel update dynamics [|| 0, 0] . Special attention was dedicated to so-called critical 
BN |l l| situated at the transition between ordered and chaotic dynamics. We follow here a complementary approach: 
our main goal is to study the statistical properties of fixed points of general BN, by recasting the original dynamical 
problem into a constraint satisfaction problem, and then using different techniques borrowed from statistical mechanics 
of disordered systems 0, 0] . A parallel approach was used in 0] , and some of the results are already discussed in 

m 

The paper is organised in the following way. In Sec. [H] we introduce the model focusing on Kauffman models with 
K = 2 inputs per Boolean function. In Sec 1 1 i 1 1 we introduce the notion of the computational core, and investigate 
two different percolation phenomena related to it. In Sec. IIVI we apply the cavity method and introduce different 
message passing algorithms that allow to characterise in detail number and organisation of fixed points. In Sec. 
we extend the picture to the cases K = 3,4 and present a conjecture for the general K > 2 case. In Sec. I VII we test 
our approach in two specific biological examples: the GRN of yeast (saccaromices cerevisice) and a subnetwork of the 
fruit-fly drosophila melanogaster. Sec. IVIll is devoted to conclusions of the present work and to an outlook on various 
open question. 

II. THE MODEL 

In the simplest and most coarse-grained setting, the expression level of one gene can be modelled by a Boolean 
variable: the gene-expression level Xi of gene i can be either (gene i is not expressed) or 1 (gene i is expressed) 11(1 . 
Although gene-expression levels (given by mRNA and protein concentrations) are certainly not Boolean variables, it 
turns out that a Boolean level of description is able to capture many essential features of gene regulation and moreover 
can cope with the often only qualitative nature of biological knowledge [l6j ]. This simplified view has become more 
adequate since the recent progress of biotechnology has made fundamental biological mechanisms experimentally 
accessible at a genome-wide scale. An example are gene-expression experiments, where the activity of a large number 
of genes is examined at the same time, using DNA-chip techniques whose output needs to be digitalised before further 
processing in a Boolean framework. 

Having N genes (symbolised by circles in Fig.|2), the global expression pattern can be described in terms of a N- 
dimensional Boolean vector x = (xx, . . . , xn) G {0, 1}^. In this framework, gene-regulatory constraints are modelled 
as Boolean functions (the squares in Fig. |2J) : the expression level x a of gene a is determined by a set of K expression 
levels x ai , ■ ■ ■ , x aK of the transcription factors a%, . . . , clk'- 

Xa -^a (*^ai j •••) ) (-0 
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with a G A C {1, N} running over all regulated genes (see for a an interesting study on the different strategies 
nature may adopt in order to implement Boolean regulatory rules). As shown in Fig. [2 not all variables need to be 
controlled by a Boolean function, i.e. in general we have \A\ = M with < M < N. Note also that the actual value 
of the number K of inputs to F a may depend on a, even if the major part of this work will concentrate on the case 
of constant K for reasons of clarity. 




FIG. 2: Factor graph representation of a small Boolean network: circles symbolise the variables, squares the Boolean functions. 
xi is an example for an external input variable, xa for a functional variable, whereas X3 stands for a transcription factor. 



The whole set of M = aN Boolean constraints together with the N variables dehne completely the network. The 
aim of this work is to introduce new methods borrowed from statistical mechanics of disordered systems. These 
methods are able to enumerate and classify the 7Vf p fixed points, i.e. the set of all vectors x fulfilling simultaneously 
all the M Boolean constraints of type Eq. . With this aim we introduce an Hamiltonian counting twice the number 
of not satisfied Boolean constraints: 

r hL(x*) — 2 ^ x a (B F a , . . . , x aK ) ^ E a {x a , x ai , . . . , x aK ) . (2) 

aeA aeA 

The symbol © stands for the logical XOR operation, i.e. each cost term contributes to the sum iff Eq. is fulfilled, 
and 2 otherwise. The prefactor two is introduced for later convenience, and the notation E a is simply an abbreviation 
which will be useful later on. The A/f p fixed points equal consequently the ground states of Hamiltonian provided 
that the minimum energy is zero, i.e. that all constraints are simultaneously satisfied. This embeds the problem of 
finding hxed points into the class of constraint- satisfaction problems, which have been recently studied extensively 
from the point of view of statistical physics, using various techniques from spin-glass physics, in particular the replica 
and cavity methods 0,0], but also percolation-type arguments. 

In this work, we concentrate our attention to random factor graphs subjected to two conditions: 

(a) Function nodes F a have fixed in-degree K and out-degree one. 

(b) Variables x a have in-degree at most one. This means that all regulating variables are collected in one single 
constraint F a (see Eq. (Q). 

Setting a :— M/N, the degree distribution of variable nodes approaches asymptotically 

a6 din ,i + (1 - a)5 dia ,o (3) 

i.e. the out-degree distribution is a Poissonian of mean Ka, while in-degree distribution is bimodal. Since in- and 
out-degrees are uncorrelated, the joint degree distribution factorises: p(d out , d- m ) = p out (d out ) p m (dj n ) . Random factor 
graphs are obviously a drastic oversimplification for a true GRN, since the available data show evidence for a scale-free 
p ont (d ou t) and a large but concentrated in-degree distribution of function nodes more compatible with an exponential- 
like form 0, 0| . Nevertheless, this simplified model allows for many detailed analytic predictions that can guide our 
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FIG. 3: Schematic representation of the three types of variables: external input, functional, and regulatory. 



comprehension in more realistic and interesting cases, and is a well-controlled testing ground for techniques borrowed 
from statistical mechanics. Such techniques are not limited to random graphs and can be easily extended to deal with 
more realistic cases. 

In general, we can distinguish three sets of variables as displayed in Fig. 01 

• External input variables: There are N — M = (1 — a) N variables which are not regulated by any function. They 
represent external inputs to the network, as e.g. chemical concentrations of non-proteins. We do not consider 
these external inputs as dynamical variables of the system. In the original definition of Kauffman networks they 
are in fact included into the definition of the BN itself: This definition works at a — 1 but allows for constant 
functions, whose outputs are the analogues of our external input variables. 

• Regulatory variables are all those variables which are regulated and regulate. They can therefore be considered 
as transcription factors. 

• Functional variables: There are e~ 2a N variables which do not regulate any other function. These are to be 
identified with functional proteins not involved in transcriptional regulation, but acting instead as enzymes, 
membrane channels etc. 

One has to note that in a random network there are some variables which belong theoretically to both the external 
and the functional variables because they are isolated. They contribute only trivially to the behaviour of the network 
and can therefore be neglected in the discussion. 
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TABLE I: Truth table for all 16 boolean functions of K = 2 inputs. 



We now have to specify the functions acting on top of the random topology defined so far. There are 2 2 = 16 
Boolean functions, which can be grouped into 4 classes [l9| : 

(i) The two constant functions. 

(ii) Four functions depending only on one of the two inputs, i.e. Xx,x\,X2,x~2- 

(iii) AND- OR class: There are eight functions, which are given by the logical AND or OR of the two input variables, 
or of their negations. These functions are canalising. If, e.g., in the case F(xi, x%) = X\ A x 2 the value of x\ is 
set to zero, the output is fixed to zero independently of the value of Xi- It is said that x\ is a canalising variable 
of F with the canalising value zero. 

(iv) XOR class: The last two functions are the XOR of the two inputs, and its negation. These two functions are 
not canalising, whatever input is changed, the output changes too. 

We keep in mind that the case of Boolean functions depending on exactly K = 2 input variables is not general in 
gene regulation: genes, in particular in higher eukaryotes, are frequently regulated by much more TFs. However, the 
number of Boolean functions of K variables increases as 2 2 .A complete classification of Boolean function becomes 
intractable already for relatively small K > 4. For clarity we therefore concentrate first on true K — 2-functions 
only, i.e. on those in the AND-OR class and the XOR class. The organisation of the fixed points does not depend 
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on the relative appearance of the functions within each class, but only on the relative appearance of the classes. We 
therefore require xM functions to be in the XOR class, and the remaining (1 — x)M functions to be of the AND-OR 
type, with < x < 1 being a free model parameter. In this sense, for K = 2, the network ensemble is completely 
defined by a and x. Extensions to higher if-values are discussed below. 

III. THE COMPUTATIONAL CORE OF A BOOLEAN NETWORK 

Before coming to the analysis of the fixed-point structure of random Boolean networks, we present an algorithmic 
procedure to determine the computational core (CC) of a Boolean network under a given configuration of external 
variables. 

Many variables can be fixed simply by following logical cascades originating in the external variables, and the 
corresponding Boolean constraints are satisfied accordingly. Some other constraints can be fulfilled because the 
included variables are otherwise unconstrained, see below for a precise definition. However, a BN may contain a 
subset of equations which cannot be solved in such a simple way on the basis of local arguments only. We define the 
CC as the maximal subnet formed by these Boolean functions. 

In order to determine this core, we prune iteratively all the function nodes contained in the before-mentioned 
logical cascades, and all those containing enough under-constrained variables. If no such computational core exists, 
the determination of fixed points is just a trivial task. For a non-trivial organisation of fixed points (e.g. in clusters 
as found in the sections below) the existence of an extensive computational core is a necessary, although not sufficient 
condition. 



A. Propagation of external regulation (PER) 

Let us start our analysis by eliminating logical cascades which propagate the external information into the network. 
If both inputs to a Boolean function are external variables, then also its output is fixed by the external condition, 
propagating thus the external regulation to the output variable. As shown in Fig.^J this argument can be iterated, and 
the external regulation can also be propagated to variables depending not directly on external variables. Functions 
with fixed output correspond to fulfilled Boolean constraints, and they can be considered as being deleted from the 
network. Note that, in case of a canalising function, the output can already be determined by one input set to the 
correct value , i.e. the canalising one. We include also this propagation into the PER analysis. The remaining part 
of the BN will be called the PER core. 




FIG. 4: PER: Both inputs to the upper left Boolean function are external, so also the output is directly fixed. Once this is 
done, also the inputs to the second function are fixed, and again the information can be propagated. All variables in this small 
sample graph are therefore determined by PER, no core exists. 

Note that, due to the input-dependent propagation via canalising functions, the resulting PER core depends on the 
specific configuration of the external variables, and is no longer a purely topological property of the network. On the 
other hand, the PER core size is expected to be self-averaging: almost all external input configurations lead to the 
same PER core size. Note also that a PER core can exist only if the original BN contains feedback loops. The latter 
can, however, be broken during the pruning process if canalising functions are included. Directed percolation of the 
underlying factor graph is thus a necessary but by no means sufficient condition for the existence of a PER core. 



6 



Its size can be determined analytically using an iterative probabilistic argument (which becomes exact in the large 
N limit due to both the uniqueness of the PER core for a given external condition, and the locally tree-like structure 
of a random Boolean network). A function has to be deleted from the network if cither both inputs are fixed (i.e. the 
inputs are either external or regulated by an already eliminated function), or if only one of the inputs is fixed, but to 
its canalising value. Denoting the fraction of all Boolean functions which belong to the core by tt RE r, we can thus 
write 

1 - ttper = (1 - an PER ) 2 + i (1 - x) 2 (1 - a-K PER ) om PER . (4) 

The first term on the right-hand side describes the situation that neither of the inputs depends on a function of the 
core, the second term restricts to canalising functions (factor 1 — x) with exactly one output depending on a surviving 
function. The prefactor 1/2 gives the probability that the fixed input variable takes its canalising value. We thus find 
for the survival probability 

KPER = 1 - (1 - air PER ) 2 -(1-x) (1 - aiXpER) OCKp ER . (5) 

This equation has the obvious solution irp E R = 0, corresponding to the non-existence of a PER core. This solution is 
correct only for small a, at higher a a second solution exists. This solution appears continuously, and its appearance 
can thus be determined by linearising the above equation. We find easily the x-dependent threshold 

ap ER {x) = — J— , (6) 
1 + x 

which decreases from one for x = (pure AND/OR functions) to 1/2 for x = 1 (only XOR-class functions). At this 
line, a percolation transition takes place. For a-values below (or left) of the transition line no extensive PER core 
exists, almost all variables can be fixed by propagating the external condition via trivial logical cascades. Fixed points 
of the network are thus trivial consequences of external regulation. This is not longer true for larger densities of the 
network. There an extensive part of the functions survives the removal procedure, a PER core exists. 

PER is related to the percolation of information studied in There, the relative evolution of two slightly 
different initial configurations under synchronous update is studied. This can analogously be studied by looking to 
the propagation of a bug, i.e., of a single flipped variable Xi in a fixed point configuration. The main issue is the 
number of variables which are directly influenced by this variable flip, or, more precisely, the expected number n of 
Boolean functions depending on Xi which become unsatisfied by the bug. On average, there are 2ax non-canalising 
functions depending on a randomly selected Xi, and 2a(l — x) canalising ones. The first become always unsatisfied 
if one variable is flipped, the canalising ones only if the other input does not carry its canalising value (i.e. with 
probability 1/2). We thus find 

n = 2ax + a(l — x) = a(l + x) . (7) 

If this number n is smaller than one, the bug is healed out with high probability after just a few iterations, the 
system is dynamically in its frozen phase. For n > 1 the number of new bugs increases exponentially, and the system 
diverges exponentially from its initial fixed point: we are in the chaotic phase of the BN under synchronous update. 
Marginal stability is held exactly at the PER core transition line ap E R(x), establishing thus the connection between 
the so-called critical networks and propagation of external regulation. 



B. Leaf removal (LR) 

Whereas PER removes trivial logical cascades from the Boolean network, LR is able to eliminate those functions 
which are trivially satisfiable due to otherwise unconstrained variables [20|. Such variables are leaves, i.e. variables of 
total degree one. Depending on whether they are inputs or outputs to the only function they are involved in, we also 
speak of in- and out-leaves. 

In Fig.[5]the cases of removable Boolean functions are summarised. The first one is an out-leaf. Depending on how 
the inputs are set, the output variable of the function can be set - without influencing any other function. Similarly 
in the second case, where both inputs to a function are leaves: If the output is fixed to some value by consistency 
with the rest of the network, the inputs can be adjusted easily (remember that no constant functions are present). 
For the case of a XOR function, even one input being a leaf is sufficient for removal: Whatever the value of the other 
two variables is, the function can be satisfied by selecting the leaf-input accordingly. 

The removal procedure is not restricted to the leaves of the original graph, but it can be iterated until no removable 
functions are left. The remainder is called the LR core. Its size can again be calculated on a random Boolean network 





XOR 



FIG. 5: LR: Cases, where a function can be pruned since it can be satisfied easily due to otherwise unconstrained variables. 
Whereas the first two cases, i.e. the out-leaf and the double in-leaf, are purely topological and valid for any Boolean function, 
the rightmost figure corresponds solely to the non-canalising XOR-type functions. 



using a simple iterative scheme. First we give a self-consistent equation for the probability, that a variable becomes 
an out-leaf (tout) or an in-leaf (t; n ) at a certain point of the removal procedure. Let us first consider out-leaves. They 
appear if all functions depending on the variable are already removed. This probability reads 



tout — 



_ 2a (2ax)^(2a(l-x))^ 
di! d 2 \ 

cxp{-2a(l - i out )(l - xt in )} , 



[l-(l-<out)(l-*in)] dl t[ 



d 2 
out 



(8) 



where d\ is summing over the XOR-type functions, di over the canalising ones. We have further on taken into 
account that a XOR-type function can be removed when either the other input or the output are a leaf, whereas an 
AND/OR-type function is removable only iff the output is a leaf (the case of two in-leaves is obviously excluded since 
one of the inputs is the fixed one in this consideration). For an in-leaf to appear, also the input function (if existing) 
has to be removed, leading to the slightly modified expression 



tin = (I - a + ax(l - (1 - t in ) 2 ) + a(l - x)^) t out 
= (1 - atxQ. - tin) 2 - a(l - x)(l - 4)) t ou t ■ 



(9) 



These equations close in t out and ti, and allow us to write the survival probability for an arbitrary Boolean function 
in the network: 



TTLR = X(l - t ou t)(l ~ tin) 2 + (1 - X)(l - t out )(l - tfj 



(10) 



where the first term accounts for the fact that an XOR-type function survives if and only if none of its in- or outputs 
becomes a leaf at a certain point, whereas an AND/OR-type function survives if neither the output becomes a leaf 
nor both inputs are simultaneously leaves, cf. Fig. [5] The remaining fraction of XOR-class functions can be evaluated 
as 



xlr 



.T(l-tout)(l-tin) 2 



X{1 - tout)(l - tin) 2 + (1 - X)(l - t out )(l - t 2 J 

X 



(1 



(11) 



which is smaller than x because 1 — tf n > (1 — ti n ) 2 - This observation reflects the fact that LR is more efficient on 
non-canalising functions due to the rightmost removal case in Fig. 

But does an extensive LR core exist at all, i.e. is n^R larger than zero? The obvious solution t out = t in = 1 of 
Eqs. (|8I9|I - every variable becomes a leaf at a certain point - leads to ~klr — 0, i.e. predicts the non-existence of a 
core. For high a, we find a non-trivial solution which appears discontinuously at a threshold cilr{x) which changes 
monotonously from a_L/?(0) = 1/2 to a(l) = 0.8839. The discontinuous jump in the core size is maximal for the pure 
XOR-type model (x = 1): the value of ~klr changes from zero to about 0.386. 
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To determine the threshold for arbitrary x, we observe that Eq. © can be used to eliminate one of t ou t and t; n - 
For the following, it is more practically to write ti n as a function of t ou t- We find 

kn = AT ~ 9 n \ V , J (I - 2a^ out ) 2 - 4(1 - a)a(l - 2s)tg ut =: T in (t out ,a) , (12) 

2a(l - 2x)iout 2a(l - 2x)t out v 

where the sign in the solution of the quadratic equation was chosen to guarantee that ti n £ [0, 1] for t ou t £ [0, 1]. 
Plugging this into Eq. © we obtain one single equation 

tout = exp{-2a(l - t ut)[l - xT in (t out ,a)}} =: T out (t out ,a) (13) 

for tout as a function of a. At the LR transition point, two solution branches different from one appear discontinuously, 
directly at the transition point with infinite slope in a. The lower one is the physical solution t out (a). Let us denote 
a(tout) its inverse function. Plugging it into Eq. (|13[) . the latter becomes identically fulfilled. Deriving both sides with 
respect to t ou t, we thus find 

-i d _, , , dT ou t(t ou t , ol) dT ou t (tout j Q() u, >. ✓ 1 ,v 

1 = 37 — T out (tout, a (tout)) = 57 + E a (*out) ■ (14) 

at out dtout ou 

Directly at the LR transition, the derivative of a(t ou t) vanishes (due to the infinite slope of its inverse function 
tout (ft))- The partial derivative with respect to t ou t can be evaluated, and we find that the transition line ulr{x) is 
given self-consistently by the equations 

tout = exp{-2«L_R(l - tout)[l - a;T in (tout, OtLR)]} 

1 = 2 a LR t ou t [1-X Tjn (t out , CtLFt) + X(l - tout )9f out Ti n (t ou t, OLLr)] ■ (15) 

The result is summarised in Fig. 



C. Combining LR and PER 



The two removal procedures can be combined to obtain the computational core of a network (for a given config- 
uration of the external variables, cf. the discussion at the beginning of Sec. 1111 AT) . We first have to iterate the LR 
procedure to obtain the LR core, and then to apply PER in order to understand which part of the LR core is trivially 
fixed by propagating the external information via logical cascades. Note that we cannot simply change the order of 
the two graph reduction processes: whereas the argument for pruning Boolean functions depending on one or two 
in-leaves relies on the fact that these are not otherwise constrained, PER propagates constraints and fixes at least 
some of the in-leaves of the PER core to non-arbitrary values. 

An immediate consequence of the combination of both procedures is that the region showing a core is shrinking 
in the phase diagram to a part of the intersection of the two regions with PER core and with LR core. On random 
Boolean networks a complete analytical determination of the percolation line and the size of the computational core 
is possible, the technical details are explained in App. ^ Here we will only review the main results: 

• For x < 0.636, a continuous emergence of the computational core can be observed at some acc( x )i cf. the 
dash-dotted line in Fig. This line is situated deep inside the region where a LR core exists, and approaches 
the PER transition line if we approach x = 0, i.e. the purely canalising case, from above. 

• For x > 0.636, the existence of a LR core implies automatically also the existence of a non-trivial computational 
core, we consequently find otcc{x) — ollr{x). The transition is discontinuous, i.e. the size of the core jumps at 
the transition from zero to a finite fraction of the full Boolean network. 

• Both regimes are separated by a tricritical point at (a,x) — (0.785,0.636). 

As discussed above, the determination of fixed points in the region without a computational core is an easy com- 
putational task. Note that in particular the critical networks studied in the context of synchronous update dynamics 
are completely located within this easy phase. The situation can become more complicated in the presence of a com- 
putational core, since simple local methods to construct fixed points (based on extensions of the discussed removal 
procedures) fail to determine all variables. The percolation arguments presented so far are, however, not sufficient to 
show that structure and organisation of fixed points really changes as soon as an extensive computational core of the 
full network exists. Answering to this question is the main goal of the following two sections. 
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0.8839 




FIG. 6: Phase diagram for the percolation of the computational core. We give the transition lines for PER, LR and the 
combination of both. Left of the transition lines, no core exists, right of the lines, a non zero core exists. The lines of LR and of 
LR+PER merge in the tricritical point (0.785,0.636), below this point the core appearance is continuous, above discontinuous. 



IV. THE CAVITY APPROACH 



In this section we develop tools that allow us to enumerate the different fixed points of a Boolean network, and 
to classify their organisation. These tools are borrowed from statistical physics, based on the identification of fixed 
points with the zero-energy ground states of Hamiltonian (|5J). More precisely, we use the cavity method [l2L Il3| as 
recently developed in the framework of disordered systems in order to solve statistical mechanics problems defined on 
finite connectivity graphs, and reformulate it in a way able to deal with Boolean Networks. 

As seen in Sec. IIIII we are to identify an extended region in the (x, a) plane, where an extensive computational core 
of BN exists. Inside this region the mechanism of regulation - given not only by trivial cascades of logical implications, 
but also by non trivial loop structures - starts playing a relevant role. More sophisticated techniques are needed to 
study this phase. 

A. Information flow in the network and definition of cavity biases 

The first step is to transform the Hamiltonian of Eq. © from Boolean variables xi <E {0, 1} to Ising spin Si — 
2xi — 1 6 {±1}. It is easy to see that each energy contribution in Eq. J5J can be rewritten as 



Ej 1 ,.J 2 ,.J 3 (si,S 2 ,S 3 ) = 1-JiSi 



(1 + J 2 s 2 )(l + J3S3) 



AND - OR class (16) 



Ej(si 7 82,82) = 1 — JS1S2S3 XOR class , (17) 

remember that the first entry (si) in these interaction terms corresponds to the output of the Boolean function, 
whereas the others (32,3) describe the inputs. 

In the first of these two equations, J\ = ±1 switches between the OR (J\ = +1) and the AND (Ji = —1) sub- 
families, while the other parameters ^2,^3 = ±1 act as negation of the input variables. As an example, the OR 
function is given by (J\, J2, J3) — (1, —1, —1). It is easy to check that the zero-energy configurations are (si, sa, S3) = 
( — 1,-1,-1) (corresponding to two false input variables and one false output variable), and (1,-1,1), (1,1,-1), 
(1, 1, 1) (corresponding to at least one true input variable, and consequently a true output variable). The remaining 
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four configurations lead to energy two, and are thus forbidden in a zero-energy ground state. There is only a unique 
parameter J in Eq. i|17|) : it implements the XOR function (J = 1) or its negated counterpart XOR (J = —1). The 
parametrisation is displayed schematically in Fig. [7] 



S 3 CU 



3 OR 





►Os, 



FIG. 7: AND-OR , XOR function nodes. 



In the cavity formalism developed originally in |21|,|22j], one sets up a self-consistent iterative procedure which allows 
to infer the marginal probabilities for single spins or small groups of them from the original BN. In each step, a cavity 
is carved into the network by taking out one variable Si, and all the Boolean constraints E a containing it either as 
an input or output variable (denoted formally as o G i). The resulting reduced network is called the cavity network 
BNj . Further on, the properties of the neighbours Sj of Si in all ground-state configurations of BN^ are characterised 
by integer cavity fields hj^ a (with a £ i, j G a\i). More precisely, the minimum energy of BN^ with arbitrarily fixed 
Sj can be written as 



const — 



(18) 



j: j£a\i, a£i 



This equation contains the basic assumption of the cavity method: all spins Sj on the cavity system are statistically 
independent. In large random networks this is expected to be asymptotically true due to their locally tree-like 
structure, in finite graphs this may still be a reasonable approximation. An immediate consequence of this equation 
on all minimum-energy configurations of BN^ including the optimisation over all Sj can be drawn: a positive cavity 



field hj^ a > indicates that the corresponding variable Sj is fixed to 



indicates that Sj can take both values ±1, 



whereas hj^, a < implies s. 



-1 on the cavity network BNj, hj- 
-1. 







Imagine now, that we put back the spin Si and one constraint E a into the cavity graph, and that we fix Si to one 
of its two values. The energy shift in ground state energy is (up to a constant) given by, cf. Fig. |H1 

min E a (si, Sj,s k ) - (hj^ a Sj + h k ^ a s k ) = -if™^ - SiU 1 ^ 

Sj,S k 

min E a (si, Sj, s k ) - (hi-> a Si + hj-> a Sj) = - ^u"^ 



min E a (si,Sj,s k ) - {hi^ a Si + h k ^ a s k ) = -w^j - Sj<^!j 



(19) 




FIG. 8: The cavity bias u sent from function node a to site i as a function of the cavity fields hj^ a and ht^ a - 

The u are called cavity biases. Its physical meaning is clear: in a minimum-energy configuration, the variable 
has to take a value parallel to the bias. Note that the initial BN is directed, as Si is given by the output value of 
a Boolean function F a . Here we see that cavity fields and biases arc defined also in the directions opposite to the 
directed nature of the BN. Within the notation of Eq. I|19|) . in-biases act into the direction of the links of the factor 
graph, out-biases against it. In analogy, we define out-fields to act along the graph direction, in-fields against |36|.As 



11 



a general consideration, it is not surprising that there should be cases when information flows against the directed 
networks arrows: imagine a configuration in which si = 1 being output of an AND gate. This will immediately 
imply that S2 = S3 = 1, since this is the only input configuration leading to the desired output. Back-propagation 
of information is therefore plausible. We will see that there are cases where both conditions (presence or absence of 
back propagation of information) are met. 

At this point we can explicitly perform the minimisation in the above equations and express the ws and us as 
functions of the messages hj^ a . Using the notation of Eqs. Ijlfijl and (jl7|l . we obtain for the two functional classes: 

• Canalising Functions (AND-OR family): 



uj\h 2 ^ a ,h 3 ^ a ) 


= Jl(0(- 




ah 3 ^ a ) ~ 0(-/3^3->o)) 




= \h 2 ^a\ 


+ \h 3 ->a\ 


- \uf\h 2 ^ a ,h 3 ^ a )\-29{-J 2 J 3 h 2 ^ a h 3 ^ a )9{J 3 h 3 - >a 


Uj(h 2 ^a, h 3 ^ a ) 


= JM- 


-J 2 h 2 ^ a ) 


+ 0(-J 3 h 3 ^ a ) - 9(J 2 J 3 h 2 ^ a h 3 ^ a )) 




= \h 2 ^a\ 


+ \h 3 ^a\ 


- |tt™(ft2-«j, h 3 ^ a )\ 



• Non-Canalising Functions (XOR family): 

Uj ut {h 2 ^ a ,h 3 ^ a ) = JSign(h 2 ^ a h 3 ^ a ) 

w°f(h 2 ^ ai h 3 ^ a ) = \h 2 ^ a \ + \h 3 ^ a \-\u°? t {h 2 ^ a ,h 3 ^ a )\ 

uy(h 2 ^ a ,h 3 ^ a ) = JSign(h 2 ^ a h 3 ^ a ) 

U#(fol->o, fc3-a) = \h2^a\ + \h 3 ^a\-\u U j{h 2 ^ a ,h 3 ^ a )\ (21) 

Since flipping the spin on the right-hand side of Eqs. (|19f) may lead to energy differences 0, ±2, cavity biases take 
integer values u in ' out e {-1,0, +1}. 




FIG. 9: The cavity field hi-t a is the sum of all cavity bias pointing to site i except that of function node a. 

In order to close the equations for the cavity fields and biases self-consistently, we also have to give an expression 
of the h in terms of the u. The cavity field hi—, a can be easily determined by putting back into the cavity graph BN^ 
all but one constraints (b S i \ a). The interpretation of the fields in terms of the energy difference due to flipping Sj 
leads immediately to 

h^ a = 2J v>b-M , (22) 

C f. ei in mm) and see Fig. |^1 The true field acting on Sj in the full problem can also be calculated from the 
cavity biases by putting back all constraints: 

Hi=J2 u ^i- ( 23 ) 

Eqs. 120I23|I are called warning propagation equations (WP). They can be exploited algorithmically: first Eqs. (|20I22() 
are iterated until a solution for all cavity fields and biases is found, and then the Hi are determined. As discussed 
above, a positive field forces the corresponding spin to take value +1, a negative value —1. Only sites with zero fields 
are not yet fixed in value, and can be fixed via a graph decimation process, cf. [T3. Il3j|. 

This iterative solution works, however, only if there are at most a few solutions to the WP equations, corresponding 
in a physical language to only one or a few thermodynamic states as zero temperature. This assumption is equivalent 
to replica symmetry. We will discuss this case in detail before coming to the possibility of multiple solutions to WP 
and broken replica symmetry. 
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B. Replica symmetric solution (warning propagation) 

One can push forward the analysis considering the average over the random BN ensemble specified by the average 
density a of functions per variable, and the fraction x of XOR-type Boolean functions. In this case we can define 
two probability distributions Q ont (u) and a Q ln (u) representing the average probability of finding an out /in bias of 
strength u and satisfying the following self-consistent set of integral equations: 



k,k',b,{J} 



r ut (u) = (J dhdgP$(h)P™(g) S(u-u\f } (h,g) 
Q in (u) = /fdhdgP^f(h)P^,(g)d(u-uf J} (h,g) 



k,k',b,b',{J} 



(24) 



defining 



k 



P'k n (h) = J f[du a Q° ut (u a ) S 

a—1 \ u, — j_ / 

k i k \ 

Pkf(h) = / duQ in (u) 11 du a Q ont (u a ) 5 lh - £)u - bu\ (25) 

J a=l \ a=l / 

and where {-)k,b.{j} is the quenched average over p out (k) — exp(—Ka)(Ka) k /k\ (the site out-degree distribution), 
p ln (b) = aS(b; 1) + (1 — a)S(b;0) (the site in-degree distribution) and the probability distribution over the different 
function node types p({J})- 

Assuming that the p({ J}) is flat for both canalising and non canalising function, we force fields the distribution 
to be symmetric under u «-» — u. Recalling that u G { — 1, 0, 1} one can assume that a single number r\ £ (0, 1) can 
parametrise the cavity bias distribution: 

Q™\u) = v out 6(u) + ± J—L[S(u-l) + 5(u + l)] 

Q in (u) = v in S(u) + (1 ~ ^ [S(u - 1) + 6(u + 1)] (26) 

Inserting this simple functional form in Eqs. (|25(l . and after some algebra, one obtains the following relations, for the 
graph averaged P in ' out (^) = (P^W)^ 

P in {h) = e~ Ka ^'' imt) I h (Ka{l~r 1 ont )) 
P°^(h) = e-^Ci-fl™**) {rf» Ih (Ka(l - ry out )) + (1 - rf») [l h+1 (Ka(l - r? out )) + I h -i(Ka(l - V ° ut ))] } (27) 

where Ih(x) := X^/i( x /2) 2 V[C — h)lll] is the h th -order modified Bcssel function. One can close analytically the 
system of equations considering the weight in zero of the two P(h) in the two limiting cases: 

• A graph made of only canalising functions (x = 0): 

277° ut -l = y/l - V in \l - e -2«( 1 -"° Ut ) 

rf n = e- 2 ^ 1 -" "') {l (2a(l - n out )) - a(l - V m ) [7 (2a(l - 77° ut )) - h(2a(l - O)] } (28) 

• A graph made of only non-canalising functions (x = 1): 

1 — 77° Ut = y/l - T? n [l - e-^-rr 1 ) 

1 - v/T^ 7 = e- 2 ^ 1 -^ {J (2a(l - 77 out )) - a(l - V in ) [l (2a(l - r? out )) - 7 1 (2a(l - n out ))} } (29) 

These equations are known as density evolution equations in Information Theory |23j |. However, it turns out that 
the only solution for both equations is the trivial paramagnetic one, i.e. rf n — i] out = 1. Therefore, WP gives no 
interesting information in order to find fixed points (there are no biases), nor on their number, nor on their separation 
in phase space. 
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C. Belief Propagation 



To overcome this limitation, we have to go beyond the WP approximation introduced so far. The idea is to introduce 
in the study of BN real valued messages in [0, 1] measuring the probability that a variable takes a certain value in the 
ground states of the (cavity) problem. This resolves finer the case of vanishing cavity fields and biases - according to 
the above solution this means of all of them. We can introduce a refined iterative method - called Belief Propagation 
(BP) 0, US US ~ that allows to define the fraction of zero energy configurations having, say, Xi = 0. It is a rather 
general algorithm that allows to calculate marginal probabilities of problem defined on factor graphs. Note that also 
BP is based on the assumption of replica symmetry, i.e. of the existence of a single pure state. 

Calling a one of the clauses where Xi appears and E a ({xi}i Ea ) — 2x a ® F a (x ai ....,a K ) as in Eq. we can introduce 
the quantities: 

• (j,i-> a (xi): the probability that variable i takes value Xi in the absence of clause a. 

• m a ^i(xi): the non- normalised probability that clause a is satisfied when variable i takes value Xj. 
The two quantities satisfy the following set of equations: 



(Xi) = 1 - \ E a{{xi}lea) Vl^a(xi) 



{xi}iea\i L J lea\i 

Hi^a(Xi) = G\-> a Y[ m b ^i(Xi) (30) 
bei\a 

where C^ a is a constant enforcing the normalisation of the probability distribution fii^ a (xi). The iteration of Eq. 13UI) 
provides the correct marginal on a tree, where it can be shown that the full probability measure V(x) of configurations 
x G {0, 1}^ satisfying all constraints can be expressed as a product of site and clause marginals |25ll2r1|: 

N 

V(x) =l[P a ({x l } lea )l[P i (x i ) 1 - d * (31) 

aeA i=l 

where the marginal probability distribution Pi(xi) of variable i is the fraction of zero-energy configurations having 
variable i set to value Xi, and the joint probability distributions P a {{x{\iea) of all variables belonging to function 
node a is the fraction of zero-energy configurations having variables iGa set to {xi}i £a . Further on di denotes the 
total degree of variable i, summing its in- and out-degree. 

The Entropy S, i.e. the logarithm of the number of fixed points A/f p , can be also computed in terms of the marginal 
introduced in Eq. i|3T)l as: 

N 

S = -^P(£)mP(£) = - J2 J2 p a({xi}iea)\nP a {{xi}i ea )+YY^-^ P ^ lnP ^ ( 32 ) 

x aeA {xi}i ea i=l Xi 

We can now express the marginals in terms of the messages introduced in Eq. (|30|l 



Pa({Xi}i £a ) 



-^Ea{{xi}iea) 



Y\^i^,a{Xi) 



Pi(xi) = Ci Y[m a ^i(xi) (33) 

It is convenient to parametrise probabilities fii^ a (xi) in terms of numbers r\i^ a £ [0, 1] as /ii^ a (xi) = r]i^ a S(x] 0) + 
(1 — rii^ a )8(x; 1) and then eventually express the entropy S in terms of message quantities: 

aeA ieX aeA iea 

In analogy with the case of Warning Propagation, it is convenient to classify the set of messages {m} (one for each 
edge) as type in and type out, according to their orientation with respect to the graph. WP equations can be retrieved 
as a particular case where on each link either r\i^ a £ {0, 1} or r\i^ a — 1/2- In particular, we have seen in the previous 
paragraph how the only WP solutions are the completely unbiased ones, corresponding to rj^ a = 1/2 on every link. 
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Let {m out } C {m} be the subset messages flowing against the orientation of the graph, i.e. messages flowing from 
functional node a to the regulatory sites S2 and S3, as the case displayed in Fig. Q Let also {m m } C {m} be the 
subset of messages flowing from functional nodes a to the regulated site: as in the case of to ->i as displayed in Fig.0 
Obviously {to} = {m out } U {m in }. 

The directed nature of the graph shows unexpected statistical properties of the solutions of the BP Eqs. I|3U|) in 
all cases where zero energy ground states exist. It turns out that the set of fixed points obtained by iterating until 
convergence Eqs. (JSOJ, is characterise d by the property that for each link a — > i, m^^O) = m^^l), i.e. the set 
| TO out| j g a g am completely unbiased [33. The only biased messages belong to {m 111 }. Thus, BP cannot see back 
propagation of information on this type of directed networks. In this case an interesting series of simplifications hold: 

• Since messages {to} are non normalised, we can set both components of messages belonging to {m out } to 1. 



• The marginal Pi{xi) is then specified by the only contribution of message in: Pi(xi 
the function node regulating i. 



Cim 1 ( ^_ ti (xi), where a is 



Eqs. i|3U[) can be therefore expressed in a more compact form: 



Pi(xi)= ]r n p ^ 



1 - -E a (x l ,{xi}) 



(35) 



It is interesting to note that in the present case of directed graphs, one can directly iterate marginal probabilities 
instead of passing through the intermediate step of cavity messages {to, h} as happens to BP equations on general 
graphs. The rationale for this simplification is inherent to the peculiar asymmetric form of E a = 2x a ®F a (x ai , x aK ) 
that makes the factorisation of the marginal 



P(,%a ; X ai , ■ ■ • , X aK ) P(x a Xai , ■ ■ • , X aK )-P(x ai , . . . , X aK ) 



1 ^PaiXai X ai ^ . .. ^ X aK ^) 



K 



(36) 



2=1 



exact on a tree (again iff only zero energy configurations are taken into account, otherwise the last step in Eq. Il3fi|) 
would not hold). By inserting the factorised form 1)36(1 into the the full probability distribution P(x) of Eq. El one 
can show, after some simple algebra, that: 



v{x) = n 



aeA 



1 2 Pa {Xa 7 X ai , 



•, Xaj. 



(37) 



ieEiv 



where EIV is the set of external input variables, i.e. the set of all N — M non-regulated variables (see Fig. J3J). In 
the case that non-regulated variables are not biased by external fields, we have Pi(0) — -Pj(l) = 1/2 for all i G EIV. 
The entropy then turns out to be, with some simple algebra: 



S 



an 



, aeA 



n Kb**) 



ieEiv 



(N - M) ln(2) 



ieEiv 



In Pi (xi 



(38) 



This result implies that fixed points satisfying all Boolean functions exist for all N > M i.e. for a < 1. The line 
a = 1 can be then considered as the SAT/UNSAT transition line. The entropy value shows that each configuration 
of the (1 — a)N external input variables leads, on average, to one stationary point, i.e. the state of all internal and 
functional variables is mainly determined by the external condition, even in the case of a PER core (see Sec. IA 2J) . 
where this fixing is not just a simple logical implication. This result is far from being rigorous, since we do not know 
at present how to prove that the only unbiased messages are those flowing from functional nodes to regulated sites. 
We have checked numerically the result presented in Eq. running the BP equations on single sample at different 
values of a as displayed in Fig. UHl 



D. Survey Propagation 

Previous approaches rely on the assumptions that all zero energy configurations are arranged into one single cluster 
in phase space, i.e. that cavity fields and biases are well-defined on the basis of the WP equations alone. However, if 
the phase space breaks into many pure states (i.e. macroscopically separated clusters of zero-energy configurations), 
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FIG. 10: Entropy density s = S/N vs. average graph connectivity a. Dots correspond to the average value obtained by running 
the BP equations on samples of N = 100000 at different node concentrations x. It turns out that the entropy s is not depending 
on how one fix the different function node type but only on the topology of the graph, i. e. just on the density of non regulated 
sites (N - M)/N = 1 - a. 

each of these states corresponds to one non-trivial solution of WP. With high probability, a simple iteration of the 
WP equations does not converge to one of these solutions, but just to the trivial one. The relevant order parameter of 
the model is therefore not given by a single field/bias per link, but by the distribution of both fields and biases. This 
distribution is not to be confused with the average order parameter described by the density evolution equations, but 
it is a broad distribution over the values of the biases (fields) sampled over all clusters (pure states) for one network 
realisation. In the simple case of a single pure state, i.e. in the replica symmetric phase, single-site probability 
distributions become delta functions and the order parameter simplifies to a single global probability distribution. In 
the more general case, the order parameter averaging over the graphs ensemble becomes a probability distribution of 
probability distributions. Let us start defining (with an abuse of notation with respect to replica symmetric equations) 
the survey Q a ^i (u) from function node a to site i as the histogram of cavity bias u a ^ over all N c i different clusters: 




(39) 



The Q distributions are self-consistently computed via the Survey Propagation equations (SP) [l2t ll3L l2fj| : 





(40) 



where Ci_+ a are normalisation constants, and both functions u are defined in Scc lIV^l 




(41) 



b£i\a 



b£i\a 



represents the energy shift due to re-insertion of site i and constraint a into the cavity graph, along the itera- 
tion/minimisation procedure described in Eqs. lfT§l) . (|2Ut and (|2T)l . The appearance of the reweighting parameter 
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y > allows therefore to scan different energy levels of metastable states. It acts similarly to the inverse temperature 
in the usual Boltzmann weight: upon increasing the value of y, the probability measure concentrates on decreasing 
levels of energy, and the limit y — > oo corresponds to zero-energy ground states. Indeed, one can relate y to the 
derivative of the complexity £ with respect to the energy shift, where the complexity is defined in analogy to the 
entropy as the logarithm of the number of thermodynamic states at given energy. In this case, forcing y to very large 
values implies that the only configurations that contributes to surveys are those with vanishing energy shifts. 

The above formalism is formulated for a single sample analysis, and in this form it allows for a straightforward 
implementation of the SP algorithm, which amounts to iterate Eq. I|4U|I on real networks, for each function node a 6 A 
and each variable i involved in a, from a random initial condition until convergence. However, it is easy to generalise 
the SP equations to deal with average quantities on the random graph ensemble introduced in Sec. ^ General 
considerations on the existence of a well defined thermodynamic limit [53, imply the existence of a functional 
probability measure Q[Q(it)] describing the statistics of the Q(u) on the different clusters. Again, as in the case of 
the WP and BP equations of Sec . II V Al and IrvTH respectively, it turns out to be convenient to classify the surveys as 
type in and out. The self-consistency equations for the Q[Q out (u)] and Q[Q ln (u)] then become: 



Q[Q ou \u)} = If dQ[Q in ] [] dQ[Q° ut ] II <*Q[Q**\ S [Q out (u) - Q out (u | {Q° ut }, {Q° a > *}, Q 



a' = l 



k,k' 



Q[Q in (u)} = / f dQ[Q™]dQ[Q™] dQ[Q° ut ] f[ dQ[Q°f] 6 ^Q in (u) - Q in (u \ {Q° a ut }, {Q°J *}, Q in , Q 



o' = l 



(42) 



fc.fc' 



where <i<2[Q 4, °] := Q[Q l, °}dQ l, ° denotes the probability distribution of the probability Q l, °, i.e. the integrals in Eq. i|42|) 
are done over all probability distributions Q % '° with weight Q[Q' l '°}), and S[.] is a functional delta. 



b,{j} 



Q° ut (u | {QT%{QThQ in ) =(J dhdgPZf(h)Pt?(g) 6(u-u° { f } (h,g)) 
Q in (« | {Q° Ut }, {QTh Q in , Q in ) = (J dhd 9 P^,(h)P°f(g) 5 (u uf J} (h, g)) \ 



(43) 



b,b',{J} 



k ( / k 

p k{h) = f [] du a QT\u a )5 (h - E O exp -y I E K\ 

k J a=l I \a=l 

p kf(h) = ^eJ duQ in (u) JJ du a QT\u a )5 (h - E«a - ^ exp |-y ^E + 



E U a + bu 



a=l 



The parameter y is the reweighting coefficient which takes into account level crossing of states under cavity iterations, 
and consequently we have added the normalisation factors B l ^°. Since we are interested in zero energy configurations 
we will consider the y — > oo limit, where the reweighting factor filters out all non-SAT configurations. 

Let us analyse first the pure XOR (x = 1) case where the computation can be be done analytically along the lines 
of 20] . In the mixed case, we were not able to write analytic formulae for the complexity, hence we will present the 
single sample analysis. In the case x = 1, the numerical solution of Eq. (|43() indicates that there is a non trivial 
solution for large values of the reweighting y only in the region a > 0.883867. 

Numerically, in the y — > oo limiting case, one observes that probability distributions of biases peak into functional 
forms (|26|l . with a fraction of trivial probability distributions peaked on 5(u) and a fraction of non trivial ones. 
Therefore the order parameters can be parametrised via two scalar probability distributions of the 77's variables 
Pout (77) and p- m (v) that take the form: 



Pout(v) = r out S(r/ - 1) + (1 - r out )p out (ri) 
Pm(v) = n n S{r/ - 1) + (1 - r in )p in (r)) 



(44) 



where the r's are the fraction of trivial cavity biases. The non trivial cavity biases are characterised by a distribution 
p which in the limit y — > 00 converges to the delta function in -q = 0. 

Looking at the self consistency equations l|43|l . the only way one can obtain a non trivial distribution Q ln (u) is 
when both P%f(h) + 6(h) and P$%,(g) ^ 6(g). The probability that the field P out (0) = P in (0) = is given by 



the probability of picking all k neighbouring Q(u) = 6(u) 



respectively r 



k 

out 



for P in (0) and r* ut r? n 



for P out (0). 
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Putting everything together and averaging over the Poissonian/Bimodal degree distribution defined in Eqs. ©, we 
obtain: 



P out (0) = 



P in (0) 



E 



k=0 



k\ ° U 



j>*(6;l) + (l -«)«(&; 0)]» 



e -2a(l-r out ) + ! _ a ) 



\fc=0 



E 



fc! 



_ _-2a(l-rout) 



(45) 



Now we can close the equations on the rs, considering that the probability of having a non-trivial Q(u) is given by 
the probability that none of the other two incoming distribution is trivial: 



1-rW = [l-P° Ut ( 

l-r in = [l-P out ( 



1 - e - 2 «(l-''o„t) r . n )] e -2a(l-r„ t ) 

2 



[l-P in (0)] 

2 = [l - [1 - a(l - r ln )] e - 2Q ( 1 "'-o U t) 



(46) 



1, whereas above ad a non trivial solution 



As anticipated, for a < aa '■= 0.883867 the only solution is r out = n, 
appears. 

It is easy to show that Eas. l|46|l are equivalent to the leaf removal ones (and indeed they give the same threshold). 
Indeed, the probability for an incoming (into a node regulated by a given constraint) probability distribution of biases 
not to be trivial must be equal to the probability that both input nodes are not leaves in the pure XOR case; while 
the probability for an outcoming distribution of biases not to be trivial will be the probability that the remaining 
input node and the output node are not leaves. This translates immediately into relations 



l-r ou t = (l-£out)(l-*in) 

1-nn - (1-^in) 2 

Substituting 147|) into (|46|l one immediately retrieves LR relations. 
In the y — > oo limit the complexity is given by [12t l2ll l22| 



E = -y$(y). 



(47) 



(48) 



The free energy is given by $ = <I>i — 2a$2 with: 



*i = --(lnf f Q in (u) h duT\ duiQ out (ui)e y (^= lUa+bu ^~ v ^= 1 KHH) ] \ 

y \ v t\ J/ kMv} 

$ 2 = -i ^ln (J dhP in (h)duQ in (u)e { 



) in ( T,\ P (-y\ u \-y\ h \+y\ u + h \) 



(49) 



k,b,{r/} 



where the averages are taken with respect to the Poissonian distribution of k defined in Eq. and with respect to 
p out (?7) and p in (?7). 

Following the standard computation outlined in App.[B]we obtain for the complexity of ground states: 



E = lim y$(y) = In 2 



1 - 2a(l - tout) - e - 2Q ( 1 -*-)(l - a (i - t in )) 



(50) 



where ti >0 are solutions of Eq. 1)46(1 . In Fig.^Jwe show the complexity profile in the clustering region 0.883867 < a < 1, 
for the pure XOR model {x = 1): At a — 0.883867 it jumps discontinuously to a non-zero value and than decreases 
continuously with growing a, until it vanishes at alpha — 1. Note that the fixed-point entropy, i.e. the normalized 
logarithm of the number of fixed points, is always given by (1 — a) In 2 as calculated using the BP algorithm, and 
thus does not show any discontinuity at the clustering transition. The observation that the complexity in Fig. II II is 
smaller than this entropy demonstrates that each cluster itself is exponentially large. 

The general case, where canalising and non-canalising function (x ^ 1) are taken into account simultaneously, 
cannot be solved completely analytically. Nevertheless it is possible to iterate Eq. I|40|l numerically on single samples, 
and give an estimate of the full phase diagram (a, x) including the onset of the clustered phase. 

In Fig. E| we show the phase diagram a,x of the clustered (1-RSB) and non-clustered (RS) regions, obtained by 
scanning vertical slices (i.e. at fixed a) of the parameter space. For each point we have analysed 100 samples of 
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FIG. 11: Complexity E vs a. Dots correspond to the average value obtained by running the SP equations on 100 samples of 
N = 100000. Errors are of the size of points. The continuous line is the analytic result. 



N = 100 000. The x-error is estimated by the interval ranging from the point where all the 100 samples show zero 
complexity, to the point where all of the samples have finite complexity. For the a-error we have used the same 
technique scanning horizontally at x fixed to the corresponding measured point. 

To go beyond the case of real K = 2 functions (i.e. AND-OR and XOR class functions), we consider first a mixture 
of K — 2 non canalising functions with K = 1 functions, intended as K — 2 functions depending only on one variable 
(see second group in Tab.P). Here the combinatorial analysis can be done analytically following closely the techniques 
presented for the pure XOR (x = 1) case. In this particular mixed case Eq. I|46|l becomes: 



r Q ut = 1 - x [1 - a(l - r in )] 



1 _ e -(l+*)a(l-r ou t.) 



-(1-X) 



1 _ e -(l+a:)o(l-raut) 



= 1 - x \l - a(l - r in ) e - (1+2:)Q(1 -' w) l - (1 - x) \{l - a)(l - r in )e- (1+a:)Q(1 - r< " rt) 



(51) 



where x is now the fraction of K = 1 functions. Again, as in the case of the pure XOR model, at each value of x 
one can find the corresponding value of a where both i; n ,i out are non-trivial. This transition line is the blue dotted 
line displayed in Fig. ^| Surprisingly enough, it lies very close to the AND-OR 2-XOR transition line, but a precise 
measure performed at a = 0.95 exclude that the two transition lines are the same. 

The number of clusters is increasing with x and reaches the maximum, at any a £ [ad, 1], exactly in the case of the 
pure XOR model at x = 1. This result is displayed in Fig. ^5] where we plot the complexity S vs. x at different values 
of a. It is interesting also to note that the curves do not depend on the actual relative concentration of representative 
of the AND-OR family, but only on the ratio x between canalising and non canalising functions. As a limiting case, 
we have chosen a mixture of XOR and only direct AND functions (column 9 of Tab. UJ), and the result is the same 
obtained with a flat distribution on the AND-OR family. The only difference in the latter case is that the system 
become magnetised as a result of the lack of symmetry between and 1 which is present in the flat ensemble of 
canalising functions. Note that also in all these cases the entropy is given by the BP value (1 — a) In 2, and remains 
completely analytic as a function of a and of the composition of Boolean functions. 



V. BEYOND K = 2 



As already mentioned, the restriction to K = 2 functions is a purely technical one: the relatively small num- 
ber of 2 2 = 16 functions allows a complete characterisation into the four discussed classes, and it is possible to 
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FIG. 12: Phase diagram a,x of the clustered (1RSB) and non-clustered (RS) regions. The red continuous line is a guide to 
eye joining in a smooth way the experimental points. It represents the phase boundary of the mix canalising/non canalising 
K — 2 functions. The blue dotted line is the analytic phase boundary of a mixture of K = 2 XOR family and K=l functions 
(see second group in Tab.^J. 

exhaustively analyse these classes together with arbitrary mixtures. The number of functions is, however, growing 
super-exponentially as 2 2 with the input size K, making this exhaustive strategy intractable already for very small 
K. The general case of K > 2 needs, in addition, an extension of the classification into canalising and non-canalising 
functions. A natural extension based on the symmetries of Boolean functions are the so-called NPN functions: equiva- 
lence classes are closed under negation of inputs, permutation of inputs, and negation of the output. It is very easy to 
check that this classification reproduces exactly the four K = 2 classes discussed before: we simply take one member 
of the class, and apply the NPN transformations to reach every other element in the class, and none outside. To give 
an example, the AND function can be transformed into the OR function by negating both inputs and the output. 
NPN equivalence classes are also a natural extension of the classification scheme from K = 2 to K > 2 in the sense 
that the clustering phenomenon does not depend on the relative appearance of functions inside each class, but just 
on the relative appearance of different classes. 



In the K = 3 case, the 2 2 = 256 functions can still be classified completely: there are 14 classes, 4 of them have 
already been discussed in the context of K = 2, and only 10 lead to true K = 3 functions. Each of the 256 functions 
can be pictorially represented on a unit cube (a K-cube for generic K): The Cartesian position of any vertex of 
the cube gives the K input bits, and the output of the function is represented by a colouring of all vertices: a filled 
vertex corresponds to the value 1 of the output variable, an empty vertex corresponds to the case where the value 
is the satisfying one. Out of one such function, the NPN equivalence class can be constructed via the following 
transformations: exchange of two parallel planes (— negation of the corresponding input variable), permutations of 
the Cartesian coordinates (= permutation of input variables) and inversion of all vertex colours (= negation of the 
output). Representatives of all actual K = 3 classes are shown in Fig. together with the 2-XOR function. Note 
that the independence of a function on one of the variables can be easily seen in this representation by two parallel 
planes carrying identical colours. 

For each K there are in total 
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K = 3 




(52) 
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FIG. 13: Complexity E vs. concentration ol canalising functions x at different values of a inside the clustered phase. 
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FIG. 14: Function representatives of the 10 true K 
replica symmetry breaking order. 



3 classes plus pure K = 2 XOR. Function are organised in decreasing 



true i^-functions, leading to A/3 = 218. The number of boolean functions present in each type is reported in Tab. ITU 
Function are organised in decreasing replica symmetry breaking order: function types that undergo the ground-state 
clustering transition earlier in a are drawn first. Complexity curves for BN including purely functions from one class 
are shown in Fig. El As for K — 2, each curve is averaged over 100 samples at N = 10 5 . Within one NPN class, 
functions where chosen at random with uniform probability. We tested that sample to sample fluctuations are small 
(smaller than the points size) already for N — 10 4 . Finite size effect become negligible beyond N — 10 5 . 

Generally, for function types that break replica symmetry earlier, the complexity is higher. This rule has an 
exception for types 3 (see Fig. I15fl . Complexity curves for each function type do no intersect below the critical 
point at a = 1, where all complexities vanish simultaneously. Function types 5, 6 and 7 seem to have a complexity 
degeneracy along the whole £ curve. The degeneracy is resolved for type 7 with tests at N = 10 6 , but persist for the 
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TABLE II: Classification of the K = 3 boolean functions. 




FIG. 15: Ground state complexity curves vs a. Inset: enlargement of the region close to a = 1: no crossing is observed. 



other two types. We do not expect this degeneracy to be exact, but numerical values on an average of 100 N = 10 5 
samples are not distinguishable, as well as on a test at N — 10 6 . 

It is also possible to rank function types according to their "distance" from the pure XOR node. There are, 
however, many non-rigorous measures of distance. One intuitive example is the number N co i of cube links connecting 
two vertices of identical colour. Another possibility is looking at the XOR-like nature of the node on sub-planes, 
i.e. with one fixed input variable. Both indicators are shown in Tab. ITT1 

3 — AND — OR function types are the only ones that do not show a complexity transition. They are also the 
completely canalising nodes, i.e. canalising in every variable. 

An exhaustive test of all possible mixtures of function types goes beyond the scope of this work. In order to test 
whether any small fraction of any RSB-responsible function type leads to an (albeit small) RSB transition close to 
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FIG. 16: Ground state complexity curves vs a for nodes types 9 (red), 13 (pink), 4 (dark blue), 3 (green), 5 (light blue) for 
different increasing values of F V , A - Type 9,13,4: F V , A = 0.0,0.1,0.2,0.3,0.5. Type 3,5:K/,a = 0.0,0.1,0.2,0.3. 

a = 1, we calculated the complexity for the 5 nodes types for which clustering is less pronounced, mixing each type 
with a given fraction F v .a of pure AND-OR nodes. As expected, the transition point shifts continuously toward a = 1 
for each case, and the complexity curve lowers, nevertheless without disappearing for a < 1. We conjecture therefore 
a complex phase to be always present in the large N limit, besides in the single boundary case of -Fv,a = 1< Some 
complexity curves are shown in Fig. 1161 



For if > 4, we have done some non-exhaustive numerical checks, and extended the analytical calculation of the 
pure XOR-type model. The main results are: 

(i) The minimum of the clustering point ad over all combinations of functional classes is always given by the pure 
if-XOR class (the only one being completely analytically accessible). We find ay — 0.782, 0.699 for if = 3, 4 
and ad = (In if + lnlnif + In In In if + ...)/if for if 3> 1. Note that the range of the complex phase becomes 
larger with if, and asymptotically approaches zero. 

(ii) We conjecture that, for fixed if, the only class not leading to clustering is the pure if -AND-OR one. This has 
been checked explicitly for if = 2, 3, and non-exhaustively also for if = 4. This class becomes, however, both 
combinatorially and biologically less important for higher if, where threshold-like functions are expected to be 
more relevant. 

The non-exhaustive numerical checks for the if = 4 case where done on threshold-like functions (selected due to the 
conjectured biological relevance at high if values), and on a subset of 10 NPN classes that are expected to undergo a 
less pronounced clustering transition on the basis of the if = 3 insight. We always found a transition point close to 
but distinct from a = 1. 

Moreover, it is clear from the nature of the nodes (see again Fig. 114|) ) that neither local nor global paramagneticity 
of the system (as opposed to complete XOR nodes) needs to be preserved in order to observe clustering. It is therefore 
possible to build a whole zoology of networks that leave great freedom on the numerical value of the number of solution 
clusters AND of the fraction of variables fixed in a preferred direction (active or inactive from the point of view of 
boolean control). 



B. 



K > 4 
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FIG. 17: The computational core of the yeast GRN consists of the four genes which are responsible for the regulation of uptake 
and degradation of nitrogen sources. All of them are transcription factors, regulating (in part jointly) the expression of 26 
further genes which all code for functional proteins. 

VI. APPLICATIONS TO BIOLOGICAL NETWORKS 

After having analysed in great detail the sudden emergence of a computational core and the number and organisation 
of fixed points in random Boolean networks, we are now applying some of the ideas to known gene- regulatory networks. 
The methods used in the analysis of ensembles of random Boolean networks, namely percolation arguments and the 
cavity method, can be translated into algorithms acting on single graphs. This is obvious for the determination of 
the core since its very definition is iterative, but it is also true for the cavity method which can be reformulated in 
terms of message-passing techniques, including belief propagation and survey propagation as given in Eqs. i|3U|) and 
(J2DJ. The results below have to be understood as an illustration for potential applications of our approach to other 
large-scale networks. 

A. The core of the gene-regulatory network of baker's yeast 

One of the largest-scale genome-wide GRN known so far is the one of baker's yeast S. cerevisifE, for which we 
used data from Uri Alon's lab [2j|. For this regulatory network, only the topological structure is known, i.e. a 
list of transcription factors to regulated genes is given, but the specific combinatorial control of the expression level 
depending on the existence and/or concentration of the transcription factors is unknown. 

Therefore, we were able to apply only the strictly topological part of the PER and LR removal procedures, the 
function-specific steps for canalising functions in PER and for non-canalising ones in LR were omitted. 

The computational core found in this way is presented in Fig. El Even if it consists only of four genes, this result 
is highly interesting. The present genes are exactly those four genes in the network which are responsible for nitrogen 
control in yeast. Nitrogen is, however, the most important mineral nutrient for yeast. If it is present in abundance, 
yeast cells live and proliferate independently from each other. Under nitrogen starvation, yeast cells perform some 
cooperative action: Under cell division they grow into multi-cellular filaments, the so-called pseudohyphfE, which are 
capable to invade new, nutrient rich regions of the surrounding medium. The main mechanism hereby is nitrogen 
catabolite repression (NCR) |30l l3lj , which down-regulates proteins responsible for using secondary nitrogen sources 
in the presence of the "preferred" sources. The behaviour of this gene core was studied in detail only very recently 
|32| on the basis of differential equations. 

B. The network of embryonic development in fruit fly 

The segment-polarity genes in fruit fly drosophila melanogaster are the last step of a cascade of regulatory interac- 
tions responsible for the onset of dorso- ventral segmentation during the embryonic stage of development. The stable 
maintenance of segment-polarity gene expression is crucial in the development and stability of the parasegmental 
segment furrows 0,H^|. 

A detailed Boolean model, being able to capture wild type and various mutant gene expression patterns as fixed 
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points of the network, has been recently proposed by Albert and Othmer |^|. The model consists of a network of 60 
variables and 56 function nodes as a representation of a single parasegment of four cells, each one composed of 15 
Boolean variables and 14 function nodes. 

The network is therefore modular and each module can be described in terms of the following set of boolean 
equations: 



1 1 if i mod 4 = 3 or i mod 4 = 

wgt = (CIAi and SLPi and not CIRC) or [wgi and (CIAi or SLPi) and not CIRi] 

WG, = wgi 

en, = or WG l+1 ) and not SLPi 

ENi — erii 

hhi = ENi and not CIRi 

HHi = hhi 

ptCi = CIA, and not EN, and not CIRi (53) 

PTd = ptc, or (PTd and not HHi-i and not HH i+1 ) 

PR, = PTd and [HHi-i or HH i+1 ) 

SMOi = not PTC, or HHi-i or HH i+1 

cii = not ENi 

Cli = cii 

CIAi = Cii and {SMOi or hh,^ x or Wu+i) 

CIRi = Cii and not SMOi and not hhi—i and not /i/ii+i 



where module index i runs from 1 to 4. 

As a first test, we have applied the topological part of the LR+PER algorithm determining the computational core 
of the BN defined in Eqs. (|53|l . We have found that the whole network, except for the nodes {-PTCj}i=i,...,4, forms 
the computational core. We find ourselves thus in the opposite situation of the yeast network, where only the small 
nitrogen regulatory module belongs to the CC. This difference is even more astonishing since the BN ()53|l is only a 
small sub-network of the full gene-regulatory mechanism in drosophila. Our observation underlines therefore the more 
complex structure of GRNs of higher organisms compared to simpler ones. 

BN (I53|) is known to have 10 solutions, among which only one represents the wild type Q. We have applied 
belief propagation to the network. Even if this algorithm is expected to become exact only on large, locally tree-like 
networks, its prediction of about 12 fixed points is rather accurate compared to the exact result. More interesting is, 
however, the result of in silico gene-knockout experiments: Fixing single variables to a value not occurring in any of 
the ten fixed points, BP immediately predicts a negative entropy. Seen the discrete nature of the Boolean variables, 
this result has to be interpreted in terms of non-existing fixed points. BP is thus able to identify essential variables. 
This strategy can be easily extended to in silico experiments restricting more than one variable and thus allows to 
generate valuable hypotheses and suggestions for real experiments. 

In order to recover fixed points, one has to setup ad hoc strategies for fixing variables. BP can be exploited in order 
to iteratively generate solutions of the problem using a decimation technique [T^. Il3l l26| . The idea is the following: 
run BP, fix the most biased Boolean variable (i.e. the variable having the highest probability to be or 1) to its more 
probable Boolean value, and iterate the previous 2 steps until all spins are fixed. Surprisingly enough, the decimation 
process based on BP leads to the wild-type expression pattern - which is also the one having the largest basin of 
attraction from the dynamical point of view. 

We have also checked the robustness of the fixed points of Eqs. ()53JI with respect to noise. In a Boolean setting, 
the latter amounts to fluctuations in the output of the Boolean functions. We have checked this in the drosophila 
network: noise is introduced by replacing the term 1 — E a ({xi}i^ a )/2 with the finite-temperature Boltzmann term 
exp[— E a ({xi}i£a)/T]. For not too large formal temperature T, we always recover the correct wild-type solution under 
decimation. Increasing continuously the noise, the distance of the BP solution to its zero-temperature counter part 
increases slowly. This technique can be used in order to prune out of the whole set of solutions those that are most 
sensitive to perturbations. 

Finally, we have also run the SP algorithm on this network, but we did not find any sign of clustering solutions 
(even in case we relax the initial conditions, letting free all variables SLPi - see the first of Eq. J53J)). 




if i mod 4=1 or i mod 4 = 2 
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VII. CONCLUSION AND OUTLOOK 



0.8839 




FIG. 18: Phase diagram for K = 2: The green line gives the percolation transition for the PER core, the blue one for the LR 
core. The red curve results from the combination of LR and PER, and merges in the tricritical point (0.785,0.636) with the 
LR curve. Below this point, the transition is continuous, above discontinuous in the core size. The full black line indicates the 
RSB transition from an unstructured solution set to a highly clustered one. 

In this paper, we have studied in detail the structure and organisation of fixed points in large random Boolean 
networks. To do so, we have reformulated the problem in terms of a constraint-satisfaction problem, and we have 
applied various statistical-physics tools ranging from percolation arguments to the cavity approach from spin-glass 
theory. The main results of these approaches are summarised in Fig. 1181 

• Networks of small density a of regulatory functions are dominated by logical cascades and under-constrained 
variables. At a certain density, an extensive computational core of the network emerges suddenly, containing 
in particular non-trivial feed-back loops. The existence of such a computational core is a necessary but not 
sufficient condition for a non-trivial global behaviour of the network fixed points. 

• At a higher density, the set of all solutions undergoes a clustering transition, as schematically represented in 
Fig. Below the transition line, all fixed points are collected in one large connected cluster. Above the 
transition, an exponential number of these clusters exists, each of them containing still an exponetial number 
of solutions. Any two of these clusters have an extensive Hamming distance. This phenomenon is found to be 
robust with respect to the selection of the Boolean functions, and covers an increasing part of the phase diagram 
if we go to a higher number K of inputs to the Boolean functions. 

All the methods applied to random Boolean networks have also been formulated as algorithms which can be applied 
to analyse arbitrary networks. As a demonstration of possible applications, we have run these algorithms on two 
biological gene-regulatory networks, namely those of yeast and of the segment-polarity genes in fruit fly. We have 
first confirmed that the method is able to identify a biologically sensible computational core of both networks. The 
message passing algorithms based on the cavity method {i.e. belief and survey propagation) could be applied only to 
the fruit-fly case where the Boolean regulation is known: the number of fixed points was predicted rather accurately. 
In some in silico gene-knockout experiments, belief propagation was further on able to identify essential genes. 

There are, however, various open questions. First, our analysis is based on ground states of a Hamiltonian counting 
the number of violated functions, which is not directly related to any biologically motivated dynamics. The dynamical 
accessibility of fixed points remains an open challenge which will be addressed in a future project. Moreover, noise 
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sources present in real cases may force the network to quasi-stationary points close to the fixed ones (some regulation 
might not always function properly, without affecting the overall state of a cell). In this view, the study of the 
organisation and accessibility of meta-stable states in the region of low complexity and in the case of fixed external 
inputs, together with their relation with quasi-stationary points, might be relevant. 

A second major direction for future research should be the application to more complex biological cases, which 
is expected to give results going beyond the simple test cases reported in this work. However, very few genome- 
wide networks are available so far. In particular, for multi-cellular organisms only small functional modules for 
well-described functions are known. Large-scale networks known for yeast [l8L |29| and E. coli [34[ contain only the 
topological structure, not an extensive description of the regulatory functions. It is thus highly interesting to infer 
gene regulation networks from experimentally easily accessible high-throughput experiments, as e.g. given by genome- 
wide expression profiles [35l |. This inverse problem can be treated with tools similar to the ones used in the present 
analysis. 
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APPENDIX A: ON THE COMPUTATIONAL CORE: COMBINING LR AND PER 



In this appendix we give some technical details on how to determine the action of PER on the LR core, i.e. on how 
to determine the computational core of a random Boolean network with given external condition. 



1. The degree distribution of the LR core 

To start with, we need the degree distribution for the LR core. In this context we characterise every variable by 
the 4-tuple (<if n , d" n , d^ ut , d" ut ) counting the number of in- links coming from canalising and non-canalising functions 
and the number of out-links going to canalising and non-canalising functions. Note that the total in-degree df n + <i™ 
of a variable is either zero or one. We have to distinguish the following cases: 

1. The variable is neither leaf nor isolated, df n + ci[ n + d^ ut + d" ut > 2. The fraction of variables having these degree 
values is given by the generalised Poissonian 

PLn{di n , <C, rfo Ut , d" ut ) = [{1 - ax(l - tin) 2 - a(l - x)(l - t^ n )}S d f a ,oS d ^,Q + ax(l - im) 2 ^^™ ,1 

+a(l - x)(l - tf^Sdf^iSd^fi] x 

xe -2a(l- tout )(l-^n) [Ml - ^X 1 - tout)] d °" t [2^(l - *in)(l " f out (M) 

There we have used, e.g., the fact that an in-link from a canalising function survives if not both inputs of the 
function are simultaneously leaves at a certain point of the removal procedure, which leads to the factor (1 — tf n ) 
in the (5d" ,i<5d" ,o)-te rm i and similar arguments for other in- and out-links. 

2. The variable is a leaf, df n + d" n + d^ ut + d" ut — 1. According to the removal procedure, this is possible only if 
the existing link is an input to an AND/OR- type function: 

P LR (1, 0,0,0) = 

P LR (0, 1,0,0) = 

P LR (0, 0, 1, 0) = e -Mi-to«)(i-*t to ) {1 _ ax{1 _ t . n) 2 _ a(1 _ ^(i _ 4)} 2a(l - x)(l - t out )(l - t in ) 

P LR (0, 0,0,1) = (A2) 

The non-trivial term is a product of the Poissonian contribution in Eq. (|A1I) and the probability (1 — t[ n ) that the 
other input to the adjacent function is not a leaf. The contrary would lead to removal of the Boolean function. 

3. The variable is isolated, df n + d™ n + d^ nt + d" ut = 0. Note that in LR we remove only the functions, and the 
number TV of vertices remains unchanged. The isolated vertices thus collect all originally isolated vertices and 
all those, which were leaves leading to a removal step. We consequently find 

P LR (0, 0, 0, 0) = e -2"<i-W)(i-*tu0 {i _ 03.(1 _ f m f - a(l -x)(l- tf n )} x 

x [1 + 2ax(l - t ia )(l - < out )2a(l - x)(l - *out)*m] 

+ e -2 Q (l-t out )(l-^„) {aj;{1 _ t . n) 2 + _ _ ;2J } (A3) 
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2. PER on the LR core 



Now we can come to the analysis of PER on the LR core. The propagation rules were: A XOR-type function can 
be removed and its output fixed, if both inputs are fixed. An AND/OR fixes its output if either both inputs are 
fixed, or if exactly one input is fixed, but to its canalising value. This iterative rule allows us to give a self-consistent 
condition for the probability ir c (resp. 7r„) that an AND/OR type (resp. XOR-type) function fixes its output by PER 
and can be removed. (Note that we are now speaking of the removal and not the survival probability.) For 7r„ we 
find directly 



-i 2 



E 



"out 
<<Ct> 



(A4) 



This equation requires a set of explanations: The brackets (•) denote the average over Plr. The distribution 
P^^(c?? n , d[ l n , G?o Ut , d"ut) thus describes the probability that an input link to an arbitrary non-canalising function 
is coming from a variable of in- and out-degrees (c?? n , d" n , d^ nt , e?" ut ). This function has to be fixed, which appears 
with probability one if the function is not regulated, with probability 7r c if it is regulated by a canalising function, 
with 7r n if regulated by a non-canalising one. We may unify all three cases in the single expression nt ia 7r^ in , since 
df n + df n = 0/1, i.e. since a variable is never regulated by more than one Boolean function. For an AND/OR-type 
function, we first denote the probability that one input is fixed by p( m ). Using that a variable takes its canalising 
value with probability 1/2, this allows us to write 



7T C 



p(in)p{in) _|_ 2p(m)^ _ ) x 1 



p(in) 



E 



Rut) 



PLR{d C lnl d^ nl d C ontl dl nt ) 7T c in TT n " 



(A5) 



Using the degree distribution Plr determined in the last subsection, we can easily calculate the P^-averages as 
almost complete Poissonian sums 

(Ct) = 2az(l - t out )(l - tin) [l - e -2«(l-t= ut )(l-^„) {1 _ ax{ l _ t . n f _ Q (l _ x){1 _ t 2j|" 

(O = 2a(l - x)(l - t in ) [l - 4ine -2a(i-*ou t )(i-^) {1 _ ax{l _ t . n) 2 _ a{l _ x){l _ t 2j } j _ (A6) 
A similar resummation can be done for the degree sums in the above expressions for 7r c /„, we obtain 

■ {1 _ e -2a (1 - tout )(l-x tin ) }{1 _ ax{l _ t . n) 2 _ a{1 _ x){1 _ t 2j } + ^ nax[l _ t . n) 2 + ^ _ ^ _ " 



1 _ e -te(i-W)(i-«fei) {1 - ax(l - t in ) 2 - a(l - x)(l - tf n )} 
= {1 - t in e- 2 °( 1 - t °~t)(i-^o)}{l - ax (l - ya - q(l - x )(l - t p} + 7r n ax(l - t in ) 2 + 7r c a(l - x)(l - tj 
* c 1 - i in e-2«(i-to Ut )(i-^„) {i _ ax (i _ < in )2 _ a (! _ _ t 2j} 

These equation can be simplified using the Eqs. I|8I9[) for the parameters of the LR core, 

"tj n (l - t out ) + TT n ax(l - < in ) 2 + 7T c a(l - x)(l - tfj" 1 



A7) 



^out(l ^in) 

tjn(l ~ t out t in ) + ir n ax(l - tin) 2 + 7T c q(1 - x)(l - tf n ) 
^out(l ^out^in) 



(A8) 



These equations can be studied to analyse the emergence of a non-trivial computational core of the random Boolean 
network, in particular the continuous transition to an extensive core below the tricritical point in Fig. [S] can be 
understood by linearising the first equation in 7r„ and ir c . Here we do not give the steps of this feed-forward calculation. 
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APPENDIX B: THE COMPUTATION OF THE COMPLEXITY E 

We will derive in some details the computation of the entropy S at x = 1. Let us compute first the $1 contribution: 

!U«a+M-j/£!:=i KHH) ] \ 



*i = -^ln^y*g is (tt) 6 duJJdttiQ out (ui)e , '( | SJ= 



oo 

a Y. e 

k=0 



i=i 



i=i L i=i 



(1 - a(l - tin))E e_2a T / II^P OUt (^)dw in Wln 



z=i 
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+ 



1=1 



2 n <^)-n 



= ln2(l - a(l - i in )) (l - 2a(l - i Q ut) - e^ 2 ^ 1 " 4 ^") - 2a 2 (l - t in ){l - t out ) 
The $2 contribution is computed along the same lines of $1, and is given by: 



In 2 



$ 2 = -ln(2)(l-i il 



1 - e 



-2e»(l-t ou t) 



(Bl) 



(B2) 



Putting everything together one obtains for E the result shown in Eq. (|5fl)) . 
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